4 |
An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models
|
|
|
|
Abstract:
Thesis (Master's)--University of Washington, 2021 ; Word embeddings are mathematical representations of words computed from a group of texts that a machine learning model is trained on. Generally, words that are similar to each othersemantically will be closer together in the vector-space created by the embedding model. The distance between words can be analyzed to understand what words tend to be used in the same contexts in a given group of texts. In this thesis, I use three different non-contextual methods of training word embedding models, Word2Vec (Mikolov et al., 2013), FastText (Bojanowski et al., 2017), and GloVe (Pennington et al., 2014), on a corpus of literature assigned to students in grades K-12 in the United States to answer three questions:- It has been shown that children are particularly prone to internalize biases in thecontent they read and watch (Railsback, 1993; Jacobs, 2003; Slater, 2003). What biases are present in literature assigned to children in grades K-12 in the United States? - Are different kinds of non-contextual word embeddings sensitive to bias in different ways? -Is the text from one book enough to detect bias using non-contextual word embedding models? I find that GloVe embeddings are more sensitive to biases in smaller corpora, while Word2Vec and FastText are more sensitive to biases in large corpora. When looking at the word embeddings from a single book, I see variations in the strength of the words that are the “most gendered” — a book that had stronger gender biases (determined through literary critiques) had words that were more strongly gendered than a book that subverted gender biases (also determined through literary critique).
|
|
Keyword:
Children's Literature; Computer science; Linguistics; Literary Bias; Machine Learning Bias; Natural Language Processing; Social Bias; Social research; Word Embeddings
|
|
URL: http://hdl.handle.net/1773/46827
|
|
BASE
|
|
Hide details
|
|
5 |
Assembling Syntax: Modeling Constituent Questions in a Grammar Engineering Framework
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Collecting and using race and ethnicity information in linguistic studies
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Tracing and Reducing Lexical Ambiguity in Automatically Inferred Grammars
|
|
|
|
BASE
|
|
Show details
|
|
9 |
A Finite-State Morphological Analyzer for Central Alaskan Yup'ik
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Inferring Grammars from Interlinear Glossed Text: Extracting Typological and Lexical Properties for the Automatic Generation of HPSG Grammars
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Linguistic fundamentals for natural language processing II: 100 essentials from semantics and pragmatics
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Braiding Language (by Computer): Lushootseed Grammar Engineering
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Modeling Clausal Complementation for a Grammar Engineering Resource
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2019)
|
|
BASE
|
|
Show details
|
|
18 |
Incorporating deep visual features into multiobjective based multi-view search results clustering
|
|
|
|
In: Mitra, Sayantan, Hasanuzzaman, Mohammed orcid:0000-0003-1838-0091 , Saha, Sriparna and Way, Andy orcid:0000-0001-5736-5930 (2018) Incorporating deep visual features into multiobjective based multi-view search results clustering. In: 27th International Conference on Computational Linguistics, 20-26 Aug 2018, Santa Fe, NM, USA. (2018)
|
|
BASE
|
|
Show details
|
|
19 |
Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs
|
|
|
|
BASE
|
|
Show details
|
|
20 |
A Parametric Implementation of Valence-changing Morphology in the LinGO Grammar Matrix
|
|
|
|
BASE
|
|
Show details
|
|
|
|